class: title-slide <br> <br> # A Data Mining Approach for Detecting Collusion in Unproctored Online Exams<br> .padding_left.pull-down.white[ J. Langerbein, T. Massing, .bold[_J. Klenke_], N. Reckmann, M. Striewe, M. Goedicke, C. Hanck <br> <br> <br> `\(16^{th}\)` International Conference on Educational Data Mining Bangalore, July 11-14, 2023 ] --- name: introduction # Introduction * COVID-19 forced universities to switch to online exams * Proctoring online exams with video conference software was prohibited by our university * We conducted take-home exams as open-book, but collaboration was strictly prohibited * Hierarchical clustering algorithms are used to identify groups of potentially colluding students * The method successfully finds groups with similar behavior in the exams * A proctored comparison group helps categorize students as _uncommonly similar_ <!-- outstandingly, Paper wording? --> --- name: related_work <h1> Related work </h1> .font80[ * Limited research exists on unproctored exams at universities <!-- prior to the pandemic --> * <a href='#bib-cleophas2021s'>Cleophas et al. (2021)</a> propose a method using event logs to detect collusion in unproctored exams * Previous studies focused on similarity measures for programming exams based on keyboard patterns, e.g. <a href='#bib-Hellas_2017'>Hellas et al. (2017)</a> and <a href='#bib-Leinonen_2016'>Leinonen et al. (2016)</a> * Other literature (e.g. <a href='#bib-hemming2010online'>Hemming (2010)</a>) relies on surveys or interviews, lacking actual student behavior data on collusion * Some studies suggest that unsupervised online exams may lead to collusion * <a href='#bib-hollister2009proctored'>Hollister and Berenson (2009)</a> used GPA and final exam scores to analyze collusion, but no data was collected during the exam ] -- .pull-down.blockquote[ <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#004c93;" xmlns="http://www.w3.org/2000/svg"> <path d="M256 8c137 0 248 111 248 248S393 504 256 504 8 393 8 256 119 8 256 8zm-28.9 143.6l75.5 72.4H120c-13.3 0-24 10.7-24 24v16c0 13.3 10.7 24 24 24h182.6l-75.5 72.4c-9.7 9.3-9.9 24.8-.4 34.3l11 10.9c9.4 9.4 24.6 9.4 33.9 0L404.3 273c9.4-9.4 9.4-24.6 0-33.9L271.6 106.3c-9.4-9.4-24.6-9.4-33.9 0l-11 10.9c-9.5 9.6-9.3 25.1.4 34.4z"></path></svg> **Aim of the Paper** Categorize students with a hierarchical clustering algorithm on event logs of our statistical exam and strengthen the analysis with a comparison to a proctored exam group ] --- name: methodology <h1> Methodology — <span style="font-size: 0.8em;"> Set-up </span> </h1> * Data for the study are collected from the *Descriptive Statistics* course at the University of Duisburg-Essen, Germany * The exams consist of arithmetical problems, programming tasks in `R`, and a short essay task * Both exams are conducted digitally with the e-assessment system [JACK](https://www.uni-due.de/zim/services/jack.php) - Each student receives different randomized numerical values across all tasks - Event logs capture students' activities, time stamps, and points during the exams for every subtask * The test group took the unproctored exam at home during the COVID-19 pandemic * The comparison group took a proctored exam in the facilities of the university * Data cleaning is conducted, removing students with minimal participation or achievement and students with internet problems * The difference between the two courses is marginal --- <h1> Methodology — <span style="font-size: 0.8em;"> Data set </span> </h1> <br> <table> <thead> <tr> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; padding-right: 4px; padding-left: 4px; background-color: #ffffff !important;" colspan="1"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">$$$$</div></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; padding-right: 4px; padding-left: 4px; background-color: #ffffff !important;" colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Groups</div></th> </tr> <tr> <th style="text-align:left;font-weight: bold;background-color: #ffffff !important;"> </th> <th style="text-align:center;font-weight: bold;background-color: #ffffff !important;"> Comparison </th> <th style="text-align:center;font-weight: bold;background-color: #ffffff !important;"> Test </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;min-width: 10em; font-weight: bold;"> Year </td> <td style="text-align:center;"> 18/19 </td> <td style="text-align:center;"> 20/21 </td> </tr> <tr> <td style="text-align:left;min-width: 10em; font-weight: bold;"> N </td> <td style="text-align:center;"> 109 </td> <td style="text-align:center;"> 151 </td> </tr> <tr> <td style="text-align:left;min-width: 10em; font-weight: bold;"> Style </td> <td style="text-align:center;"> proctocred </td> <td style="text-align:center;"> unprocotored </td> </tr> <tr> <td style="text-align:left;min-width: 10em; font-weight: bold;"> Total points </td> <td style="text-align:center;"> 60 </td> <td style="text-align:center;"> 60 </td> </tr> <tr> <td style="text-align:left;min-width: 10em; font-weight: bold;"> Sub tasks </td> <td style="text-align:center;"> 19 </td> <td style="text-align:center;"> 17 </td> </tr> <tr> <td style="text-align:left;min-width: 10em; font-weight: bold;"> Duration </td> <td style="text-align:center;"> 60 </td> <td style="text-align:center;"> 60 </td> </tr> </tbody> </table> --- <h1> Methodology — <span style="font-size: 0.8em;"> Model </span> </h1> * Agglomerative (bottom-up) hierarchical clustering algorithm * Global pairwise dissimilarities: `$$D(x_i, x_{i'}) = \frac{1}{h} \sum_{j=1}^h w_j \cdot d_j(x_{ij}, x_{i'j})$$` <br> -- .pull-left[ .font90[ * With * `\(\displaystyle \sum_{j=1}^h w_j = 1\)` * `\(d_j(x_{ij}, x_{i'j})\)` pairwise attribute dissimilarity * `\(i = 1, ..., N\)` students * `\(j = 1, ..., h\)` attributes ] ] -- .pull-right[ .font90[ * We compare two different kinds of attributes * Dissimilarities in the students event patterns (time of submission) * Dissimilarities in points achieved ] ] --- <h1> Methodology — <span style="font-size: 0.8em;"> Model </span> </h1> <br> Dissimilarities in the students event patterns (time of submission) for each task `\(j\)` `$$d_j^L(v_{ij}, v_{i'j}) = \sum_{m=1}^{K=70} | v_{ijm} - v_{i'jm} |$$` * With * Examination is divided into `\(m = 1, ... , 70\)` time intervals * `\(v_{ijm}\)` denotes the count of answers of student `\(i\)` for task `\(j\)` in the `\(m\)`-th interval * Manhatten metric is used --- <h1> Methodology — <span style="font-size: 0.8em;"> Model </span> </h1> <br> Dissimilarities in points achieved for each task `\(j\)` `$$d_j^P(s_{ij}, s_{i'j}) = | s_{ij} - s_{i'j} |$$` * With * `\(s_{ij}\)` denotes the points achieved by student `\(i\)` in the `\(j\)`-th subtask * Manhatten metric is used --- <h1> Methodology — <span style="font-size: 0.8em;">Full Model </span> </h1> `$$D(s_i, s_{i'}, v_i, v_{i'}) = \dfrac{1}{h} \sum_{j=1}^h \left(w_j^P \cdot d_j^P (s_{ij}, s_{i'j}) + w_j^L \cdot d_j^L (v_{ij}, v_{i'j}) \right)$$` * Weights `\(w_j\)` control the influence of each attribute on the global object dissimilarity * `\(\displaystyle \sum_{j=1}^h w_j^P + w_j^L =1\)` -- * We reduce the weights for * `R`-tasks, as these tasks have more noise * Essay questions, as the comparison on that kind of task are limited * Points achieved -- * Since dissimilarity measures depend on scale, the attributes are **normalized** --- <h2>Empirical results — <span style="font-size: 0.8em;">Dendrogram</span></h2> .panelset[ .panel[.panel-name[Control] <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../resources/graphics/dendogram_control.png" alt="<strong> Figure 1: </strong> Dendrogram produced by average linkage clustering of the proctored control group (2018/19). <strong> G-L </strong> mark the clusters with the lowest dissimilarity" width="100%" /> <p class="caption"><strong> Figure 1: </strong> Dendrogram produced by average linkage clustering of the proctored control group (2018/19). <strong> G-L </strong> mark the clusters with the lowest dissimilarity</p> </div> ] .panel[.panel-name[Test] <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../resources/graphics/dendogram_test.png" alt="<strong>Figure 2:</strong> Dendrogram produced by average linkage clustering of the unproctored test group (2020/21). <strong> A-F </strong> mark the clusters with the lowest dissimilarity." width="100%" /> <p class="caption"><strong>Figure 2:</strong> Dendrogram produced by average linkage clustering of the unproctored test group (2020/21). <strong> A-F </strong> mark the clusters with the lowest dissimilarity.</p> </div> ] .panel[.panel-name[Findings] ### Findings - Lower level of dissimilarity in test group - Clusters **A**, **B** and **E** standing out ] ] --- <h2>Empirical results — <span style="font-size: 0.8em;">Distribution of measured distances </span> </h2> <br> <br> .pull-left[ <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../resources/graphics/boxplot_original.png" alt="<strong>Figure 3.1:</strong> Comparison of the distance measures." width="100%" height="60%" /> <p class="caption"><strong>Figure 3.1:</strong> Comparison of the distance measures.</p> </div> ] .pull-right[ <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../resources/graphics/boxplot_norm.png" alt="<strong>Figure 3.2:</strong> Comparison of the normalised distance measures." width="100%" height="60%" /> <p class="caption"><strong>Figure 3.2:</strong> Comparison of the normalised distance measures.</p> </div> ] --- <h2>Empirical results — <span style="font-size: 0.8em;">Cluster comparison</span></h2> .panelset[ .panel[.panel-name[AB] <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../resources/graphics/plot_ab.png" alt="<strong>Figure 4.1:</strong> Comparison of the event logs and achieved points of clusters <strong>A</strong> and <strong>B</strong> from the test group (2020/21). Above the scatter plot, a bar chart is added to compare the points per subtask." width="80%" /> <p class="caption"><strong>Figure 4.1:</strong> Comparison of the event logs and achieved points of clusters <strong>A</strong> and <strong>B</strong> from the test group (2020/21). Above the scatter plot, a bar chart is added to compare the points per subtask.</p> </div> ] .panel[.panel-name[CD] <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../resources/graphics/plot_cd.png" alt="<strong>Figure 4.2:</strong> Comparison of the event logs and achieved points of clusters <strong>C</strong> and <strong>D</strong> from the test group (2020/21). Above the scatter plot, a bar chart is added to compare the points per subtask." width="80%" /> <p class="caption"><strong>Figure 4.2:</strong> Comparison of the event logs and achieved points of clusters <strong>C</strong> and <strong>D</strong> from the test group (2020/21). Above the scatter plot, a bar chart is added to compare the points per subtask.</p> </div> ] .panel[.panel-name[EF] <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../resources/graphics/plot_ef.png" alt="<strong>Figure 4.3:</strong> Comparison of the event logs and achieved points of clusters <strong>E</strong> and <strong>F</strong> from the test group (2020/21). Above the scatter plot, a bar chart is added to compare the points per subtask." width="80%" /> <p class="caption"><strong>Figure 4.3:</strong> Comparison of the event logs and achieved points of clusters <strong>E</strong> and <strong>F</strong> from the test group (2020/21). Above the scatter plot, a bar chart is added to compare the points per subtask.</p> </div> ] .panel[.panel-name[GH] <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../resources/graphics/plot_gh.png" alt="<strong>Figure 4.4:</strong> Comparison of the event logs and achieved points of clusters <strong>G</strong> and <strong>H</strong> from the control group (2018/19). Above the scatter plot, a bar chart is added to compare the points per subtask." width="80%" /> <p class="caption"><strong>Figure 4.4:</strong> Comparison of the event logs and achieved points of clusters <strong>G</strong> and <strong>H</strong> from the control group (2018/19). Above the scatter plot, a bar chart is added to compare the points per subtask.</p> </div> ] .panel[.panel-name[IJ] <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../resources/graphics/plot_ij.png" alt="<strong>Figure 4.5:</strong> Comparison of the event logs and achieved points of clusters <strong>I</strong> and <strong>J</strong> from the control group (2018/19). Above the scatter plot, a bar chart is added to compare the points per subtask." width="80%" /> <p class="caption"><strong>Figure 4.5:</strong> Comparison of the event logs and achieved points of clusters <strong>I</strong> and <strong>J</strong> from the control group (2018/19). Above the scatter plot, a bar chart is added to compare the points per subtask.</p> </div> ] .panel[.panel-name[KL] <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../resources/graphics/plot_kl.png" alt="<strong>Figure 4.6:</strong> Comparison of the event logs and achieved points of clusters <strong>K</strong> and <strong>L</strong> from the control group (2018/19). Above the scatter plot, a bar chart is added to compare the points per subtask." width="80%" /> <p class="caption"><strong>Figure 4.6:</strong> Comparison of the event logs and achieved points of clusters <strong>K</strong> and <strong>L</strong> from the control group (2018/19). Above the scatter plot, a bar chart is added to compare the points per subtask.</p> </div> ] ] --- name: discussion # Discussion * Three notable clusters (**A**, **B**, and **E**) consisting of two students each * Collusion in larger groups are not found * These results are also found with other linkage methods and parameter specifications as weightings * The approach provides a basis for the examination of clusters based on comparison with a reference group * However, the ground truth is not known, limiting the certainty of conclusions * Nevertheless, the elevated risk of detection may indeed discourage students from cheating in unproctored exams --- # Further research * Investigating long-term effects of collusion analysis on student behavior * Impact on academic integrity ? * Impact of AI on students collusion behavior? * Development of a (better) decision rule to classify conspicuous clusters <!-- * Various hierarchical clustering algorithms exist * The data driven cophenetic correlation coefficient is used to assess the algorithms * Average linkage clustering most suitable * Dendrogram, providing a visual representation of the clustering results * Notable clusters are further investigated with scatterplots and barcharts * Comparison with the results from the comparison group supports the findings * Method successfully detects at least three clusters with near identical exams * Independent of the linkage method used * Important step in adapting to the progressing of digitization of education * Equips us better for unforeseen situations in the future * E.g.: pandemics, climate change impacts --> --- name: references # References Cleophas, C., C. Hoennige, F. Meisel, and P. Meyer (2021). "Who's Cheating? Mining Patterns of Collusion from Text and Events in Online Exams". In: _Mining Patterns of Collusion from Text and Events in Online Exams (April 12, 2021)_. Hellas, A., J. Leinonen, and P. Ihantola (2017). _Plagiarism in Take-Home Exams: Help-Seeking, Collaboration, and Systematic Cheating_. ITiCSE '17. Bologna, Italy: Association for Computing Machinery, p. 238–243. ISBN: 9781450347044. Hemming, A. (2010). "Online tests and exams: lower standards or improved learning?" In: _The Law Teacher_ 44.3, pp. 283-308. Hollister, K. K. and M. L. Berenson (2009). "Proctored versus unproctored online exams: Studying the impact of exam environment on student performance". In: _Decision Sciences Journal of Innovative Education_ 7.1, pp. 271-294. Leinonen, J., K. Longi, A. Klami, A. Ahadi, and A. Vihavainen (2016). _Typing patterns and authentication in practical programming exams_ , pp. 160-165.